Interactive Runtime: Queue, Interject, Thinking, etc works.#257
Interactive Runtime: Queue, Interject, Thinking, etc works.#257Nate0-1999 wants to merge 63 commits intompfaffenberger:mainfrom
Conversation
…ctions, and eliminate noise logs for a truly smooth UX
…t spacing" This reverts commit 527230f.
Interject works laggy
Back out the most recent interject banner rendering experiment while keeping the safer queue/interject runtime path intact. This isolates the shell/queue freeze and missing-visible-interject regressions for validation against capture artifacts. Made-with: Cursor
revert latest interject visibility tweak
…ig of cleaning up and it's still laggy but great improvement. Made-with: Cursor
sometime the queue doesn't trigger and the interject printing has a b…
Restore shell streaming during foreground commands and hardened queueing and interject and queuing messages.
Create v1 stable checkpoint
Restore interactive autosave triggers
…ting Serialize above-prompt rendering
Harden parity for busy attachments
Add chooser edit and escape paths
Codex/sync upstream main
Make chooser input modal
…policy Refine shell, wiggum, and paused queue behavior
Codex/upstream pr prep
Improve prompt-surface streaming previews
Lean terminal compat and protect foreground ephemeral UI
Codex/upstream main sync surgical
📝 WalkthroughWalkthroughAdds an always-on prompt surface and runtime, routing streamed agent/tool output to prompt-local ephemeral UI, introduces BackgroundInteractiveCommand for background OAuth/work, cooperative cancellation primitives, legacy-to-bus bridging, terminal capability utilities, and extensive tests reflecting the new interactive flows. Changes
Sequence DiagramsequenceDiagram
participant User
participant PromptSurface as Prompt Surface
participant CLI as CLI Runner
participant Agent
participant EventStream as Event Stream Handler
participant Renderer as Rich Renderer / Bus
User->>PromptSurface: submit (submit/queue/interject)
PromptSurface->>CLI: PromptSubmission
CLI->>CLI: normalize & dispatch
CLI->>Agent: run_prompt_with_attachments()
Agent->>EventStream: emit streaming deltas (TextPart, ToolCall, Shell, etc.)
EventStream->>EventStream: is prompt-surface active?
alt prompt-surface active
EventStream->>PromptSurface: set_ephemeral_status / set_ephemeral_preview
PromptSurface->>PromptSurface: update preview / invalidate UI
else no prompt-surface
EventStream->>Renderer: print to console (streaming)
Renderer->>User: visible output
end
Agent->>EventStream: final AGENT RESPONSE
EventStream->>PromptSurface: clear ephemeral status/preview
EventStream->>Renderer: render final AGENT RESPONSE above prompt (via run_above_prompt)
Renderer->>User: rendered final response (durable transcript)
CLI->>PromptSurface: ready for next input
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
code_puppy/plugins/antigravity_oauth/register_callbacks.py (1)
258-266:⚠️ Potential issue | 🟠 MajorOnly treat auth as successful after model configuration succeeds.
Lines 258-266 merely warn when
add_models_to_config()fails, but_perform_authentication()still returnsTrue. That makes Line 280 switch toantigravity-gemini-3-pro-higheven on a first-time auth where that model was never registered, leaving the session pointed at an unusable model.Also applies to: 272-282
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code_puppy/plugins/antigravity_oauth/register_callbacks.py` around lines 258 - 266, The auth flow currently treats authentication as successful even if add_models_to_config(...) fails; update _perform_authentication to only return True when both the auth step and add_models_to_config succeed — if add_models_to_config returns False, emit the warning (emit_warning) and return False so the session does not switch to an unregistered model; locate the block that calls add_models_to_config and adjust the control flow so success requires both the access token/project_id handling and the model registration call (refer to add_models_to_config, _perform_authentication, emit_success, emit_warning, and emit_info to find the correct spot).
🧹 Nitpick comments (2)
tests/tools/test_file_modifications_extended.py (1)
225-244: Good test coverage for the validation path.The test correctly verifies the
KeyErrorhandling whenold_stris missing. The mock Agent pattern effectively captures the registered tool for direct invocation.Consider adding companion tests for related validation scenarios:
🧪 Optional: Additional test cases for fuller coverage
def test_register_replace_in_file_rejects_missing_new_str(self, tmp_path): registered = {} class Agent: def tool(self, fn): registered[fn.__name__] = fn return fn register_replace_in_file(Agent()) fn = registered["replace_in_file"] result = fn( Mock(), file_path=str(tmp_path / "test.py"), replacements=[{"old_str": "original"}], # missing new_str ) assert result["success"] is False assert result["changed"] is False assert "new_str" in result["message"] def test_register_replace_in_file_rejects_non_string_values(self, tmp_path): registered = {} class Agent: def tool(self, fn): registered[fn.__name__] = fn return fn register_replace_in_file(Agent()) fn = registered["replace_in_file"] result = fn( Mock(), file_path=str(tmp_path / "test.py"), replacements=[{"old_str": 123, "new_str": "updated"}], # non-string old_str ) assert result["success"] is False assert result["changed"] is False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/tools/test_file_modifications_extended.py` around lines 225 - 244, Add companion tests to cover the other validation paths for register_replace_in_file: create two new test functions named test_register_replace_in_file_rejects_missing_new_str and test_register_replace_in_file_rejects_non_string_values that mirror the existing test_register_replace_in_file_rejects_missing_old_str structure, register the tool via register_replace_in_file(Agent()), call the registered "replace_in_file" function with replacements missing "new_str" in the first test and with a non-string "old_str" (e.g., integer) in the second, and assert result["success"] is False, result["changed"] is False and that the error message contains "new_str" for the first test (and appropriate validation indication for the second).code_puppy/command_line/core_commands.py (1)
176-218: Consider adding explicit return type annotation.The return type was removed, but the function now returns either
boolorBackgroundInteractiveCommand. Adding an explicit union type improves readability and IDE support.📝 Suggested type annotation
-def handle_tutorial_command(command: str): +def handle_tutorial_command(command: str) -> bool | BackgroundInteractiveCommand:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@code_puppy/command_line/core_commands.py` around lines 176 - 218, The function handle_tutorial_command currently returns either a bool or a BackgroundInteractiveCommand but lacks a type annotation; update its signature to include an explicit return type (e.g., Union[bool, BackgroundInteractiveCommand]) and add the necessary typing import (from typing import Union) so IDEs and linters understand the union return type; keep the existing behavior and symbols (handle_tutorial_command and BackgroundInteractiveCommand) unchanged elsewhere.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@code_puppy/agents/event_stream_handler.py`:
- Around line 140-150: The merge function _merge_tool_name can leak partial
streaming names (e.g. "agent_share_your_reasoning") before the final value
arrives; change _merge_tool_name to detect the special reasoning tool string
(e.g. "agent_share_your_reasoning") and avoid returning any intermediate/partial
concatenation that would be a prefix of that special name — instead keep
returning current_name until the merged result equals the full special name, and
only then return the full name; apply the same suppression logic in the other
merging code path noted (lines ~385-404) so partial prefixes of the reasoning
tool are never emitted.
In `@code_puppy/cli_runner.py`:
- Around line 572-575: The code registers a runtime with
register_active_interactive_runtime(runtime) but only clears it on the normal
exit path, which can leave a stale global runtime after exceptions; to fix,
ensure you always call clear_active_interactive_runtime(runtime) in a finally
block paired with the try that follows the register so the runtime is removed
regardless of errors (referencing PromptRuntimeState,
register_active_interactive_runtime, clear_active_interactive_runtime, and
runtime.mark_idle) — move or add a finally that invokes
clear_active_interactive_runtime(runtime) and performs any necessary cleanup
after runtime.mark_idle().
- Around line 1194-1202: The code currently ignores the return value of
BackgroundInteractiveCommand.run(), so even when it returns False (indicating
failure) the code still emits a "completed" success lifecycle; change the await
asyncio.to_thread(command_result.run, command_result.cancel_event) call to
capture the return value (e.g., result = await
asyncio.to_thread(command_result.run, command_result.cancel_event)) and only
call emit_interject_queue_lifecycle(runtime, "completed", item=source_item,
level="success") when the captured result is truthy and
command_result.cancel_event.is_set() is False (i.e., if result and not
command_result.cancel_event.is_set()) so failed runs that return False do not
emit a success completion.
- Around line 1289-1299: The code cancels the active run via
cancel_active_run("interject") before checking whether runtime.request_interject
succeeded, which can drop the current task if the queue is full; to fix, call
runtime.request_interject(stripped_task,
allow_command_dispatch=allow_command_dispatch) first and inspect its ok result
(and position/item), and only call cancel_active_run("interject") if ok is True;
if ok is False, call emit_warning with the queue limit message (using
get_queue_limit()) and do not cancel the active run. Ensure this change is
applied where requested_action == "interject" and references the same symbols
(requested_action, runtime.request_interject, cancel_active_run, emit_warning,
get_queue_limit).
- Around line 1859-1867: The code treats the return of
run_prompt_with_attachments as a single object but that function returns a tuple
(result, task); change the call to unpack it (e.g., response, task = await
run_prompt_with_attachments(...)), then preserve the existing None check on
response (if response is None: return) before using response.output to set
agent_response; ensure prompt-only mode which returns (None, None) is handled by
the unpack and early return to avoid accessing response.output when response is
None.
In `@code_puppy/tools/command_runner.py`:
- Around line 1218-1225: The call to
get_active_interactive_runtime().notify_shell_started() is performed before
entering the try/finally, so if notify_shell_started or later
notify_shell_finished raises, it can short-circuit spinner/keyboard cleanup;
move notify_shell_started() into the main try block after acquiring keyboard
context (or after deciding release_keyboard_context) and ensure
notify_shell_finished() is invoked from the finally block but wrapped in its own
try/except to swallow/log any exceptions so they don't prevent cleanup;
reference get_active_interactive_runtime, notify_shell_started,
notify_shell_finished, _acquire_keyboard_context, and the
release_keyboard_context flag when applying the change.
- Around line 288-295: The _normalize_shell_cwd helper currently strips
leading/trailing whitespace which alters valid paths; change it to only collapse
whitespace-only inputs to None by returning None when cwd is None or cwd.strip()
is empty, otherwise return the original cwd unchanged (preserving any
intentional leading/trailing spaces) so callers receive the original non-empty
value.
In `@tests/test_cli_runner_full_coverage.py`:
- Around line 130-142: The prompt_side_effect helper currently wraps every
non-PromptSubmission value into PromptSubmission(action="submit", text=value),
which converts a test-provided None (meant to represent cancellation) into a
submission; modify prompt_side_effect so that after resolving value from
input_fn it returns None unchanged if value is None, returns the value unchanged
if isinstance(value, PromptSubmission), and only otherwise wraps into
PromptSubmission(action="submit", text=value); reference prompt_side_effect,
input_fn and PromptSubmission when making the change.
---
Outside diff comments:
In `@code_puppy/plugins/antigravity_oauth/register_callbacks.py`:
- Around line 258-266: The auth flow currently treats authentication as
successful even if add_models_to_config(...) fails; update
_perform_authentication to only return True when both the auth step and
add_models_to_config succeed — if add_models_to_config returns False, emit the
warning (emit_warning) and return False so the session does not switch to an
unregistered model; locate the block that calls add_models_to_config and adjust
the control flow so success requires both the access token/project_id handling
and the model registration call (refer to add_models_to_config,
_perform_authentication, emit_success, emit_warning, and emit_info to find the
correct spot).
---
Nitpick comments:
In `@code_puppy/command_line/core_commands.py`:
- Around line 176-218: The function handle_tutorial_command currently returns
either a bool or a BackgroundInteractiveCommand but lacks a type annotation;
update its signature to include an explicit return type (e.g., Union[bool,
BackgroundInteractiveCommand]) and add the necessary typing import (from typing
import Union) so IDEs and linters understand the union return type; keep the
existing behavior and symbols (handle_tutorial_command and
BackgroundInteractiveCommand) unchanged elsewhere.
In `@tests/tools/test_file_modifications_extended.py`:
- Around line 225-244: Add companion tests to cover the other validation paths
for register_replace_in_file: create two new test functions named
test_register_replace_in_file_rejects_missing_new_str and
test_register_replace_in_file_rejects_non_string_values that mirror the existing
test_register_replace_in_file_rejects_missing_old_str structure, register the
tool via register_replace_in_file(Agent()), call the registered
"replace_in_file" function with replacements missing "new_str" in the first test
and with a non-string "old_str" (e.g., integer) in the second, and assert
result["success"] is False, result["changed"] is False and that the error
message contains "new_str" for the first test (and appropriate validation
indication for the second).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 05a39e5c-5f5a-4d69-9686-b49566313c54
📒 Files selected for processing (57)
IMPLEMENTATION_GUARDRAILS.mdcode_puppy/agents/base_agent.pycode_puppy/agents/event_stream_handler.pycode_puppy/cli_runner.pycode_puppy/command_line/command_handler.pycode_puppy/command_line/config_commands.pycode_puppy/command_line/core_commands.pycode_puppy/command_line/interactive_command.pycode_puppy/command_line/interactive_runtime.pycode_puppy/command_line/prompt_toolkit_completion.pycode_puppy/config.pycode_puppy/messaging/legacy_bridge.pycode_puppy/messaging/messages.pycode_puppy/messaging/renderers.pycode_puppy/messaging/rich_renderer.pycode_puppy/messaging/spinner/__init__.pycode_puppy/messaging/spinner/console_spinner.pycode_puppy/plugins/antigravity_oauth/register_callbacks.pycode_puppy/plugins/chatgpt_oauth/oauth_flow.pycode_puppy/plugins/chatgpt_oauth/register_callbacks.pycode_puppy/plugins/claude_code_oauth/register_callbacks.pycode_puppy/plugins/oauth_control.pycode_puppy/terminal_utils.pycode_puppy/tools/agent_tools.pycode_puppy/tools/command_runner.pycode_puppy/tools/file_modifications.pydocs/INTERACTIVE_REGRESSION_CHECKLIST.mdtests/agents/test_event_stream_handler.pytests/command_line/test_add_model_menu_coverage.pytests/command_line/test_config_commands_full_coverage.pytests/command_line/test_core_commands_full_coverage.pytests/command_line/test_prompt_toolkit_coverage.pytests/command_line/test_tutorial.pytests/messaging/spinner/test_spinner_init.pytests/messaging/test_legacy_bridge.pytests/messaging/test_rich_renderer.pytests/plugins/conftest.pytests/plugins/test_antigravity_callbacks_coverage.pytests/plugins/test_antigravity_register_callbacks.pytests/plugins/test_chatgpt_oauth_coverage.pytests/plugins/test_chatgpt_oauth_integration.pytests/plugins/test_claude_code_oauth_callbacks.pytests/plugins/test_claude_code_oauth_coverage.pytests/test_agent_tools_coverage.pytests/test_cli_runner_coverage.pytests/test_cli_runner_full_coverage.pytests/test_command_overhaul_targeted.pytests/test_config.pytests/test_config_and_storage_edge_cases.pytests/test_config_full_coverage.pytests/test_console_spinner_coverage.pytests/test_prompt_toolkit_completion.pytests/test_terminal_utils.pytests/test_terminal_utils_comprehensive.pytests/tools/browser/test_remaining_coverage.pytests/tools/test_command_runner_full_coverage.pytests/tools/test_file_modifications_extended.py
| if requested_action == "interject": | ||
| log_event("interject_banner", text=stripped_task) | ||
| await cancel_active_run("interject") | ||
| ok, position, item = runtime.request_interject( | ||
| stripped_task, | ||
| allow_command_dispatch=allow_command_dispatch, | ||
| ) | ||
| if not ok: | ||
| emit_warning( | ||
| f"Queue full ({get_queue_limit()}). Cannot interject right now." | ||
| ) |
There was a problem hiding this comment.
Don't cancel the current run before you know the interject fits.
At Line 1291 the active task is cancelled before Lines 1292-1296 check queue capacity. If the queue is already full, the interject is rejected but the user's in-flight work has already been stopped.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@code_puppy/cli_runner.py` around lines 1289 - 1299, The code cancels the
active run via cancel_active_run("interject") before checking whether
runtime.request_interject succeeded, which can drop the current task if the
queue is full; to fix, call runtime.request_interject(stripped_task,
allow_command_dispatch=allow_command_dispatch) first and inspect its ok result
(and position/item), and only call cancel_active_run("interject") if ok is True;
if ok is False, call emit_warning with the queue limit message (using
get_queue_limit()) and do not cancel the active run. Ensure this change is
applied where requested_action == "interject" and references the same symbols
(requested_action, runtime.request_interject, cancel_active_run, emit_warning,
get_queue_limit).
fix(runtime): address review correctness issues
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tests/test_cli_runner_full_coverage.py (2)
1521-1528: Replace long fixed sleep with event-driven blocking to reduce CI flakiness.At Line 1527,
await asyncio.sleep(10)makes this test unnecessarily slow and fragile if cancellation timing shifts. Prefer waiting on an event/future that is cancelled/fulfilled by the test flow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_cli_runner_full_coverage.py` around lines 1521 - 1528, The test uses a long fixed sleep inside the fake_run coroutine (function fake_run) when handling the "first task" prompt, which should be replaced by waiting on an event to avoid timing flakiness: create an asyncio.Event or Future (e.g., first_task_continue) and in fake_run, replace await asyncio.sleep(10) with await first_task_continue.wait() (or awaiting the Future); then signal or cancel that event/future from the test flow when you want the first task to continue or be cancelled (use the existing first_task_started Event to know when to trigger it). Update references to first_task_started and any assertions around render_order/started_prompts so the test coordinates via the new first_task_continue event instead of a fixed sleep.
190-192: Tighten lifecycle tests to assert single emission, not just payload text.These tests currently validate emitted text but not multiplicity. If duplicate lifecycle emits are introduced, they would still pass. Add an explicit single-call assertion in each positive case.
Suggested assertion hardening
emitted = message_bus.emit.call_args[0][0] assert emitted.text == "[INTERJECT] stopping current work: steer now" + message_bus.emit.assert_called_once()Apply similarly to the queue and command-completion lifecycle tests.
Also applies to: 212-214, 335-337
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_cli_runner_full_coverage.py` around lines 190 - 192, The test currently only checks the emitted payload text but not that the emitter was invoked exactly once; update the test(s) that inspect message_bus.emit (e.g., the block capturing emitted = message_bus.emit.call_args[0][0]) to also assert a single emission using message_bus.emit.assert_called_once() (or assert message_bus.emit.call_count == 1) before examining emitted.text, and apply the same single-call assertion to the similar cases at the other locations (the queue and command-completion lifecycle tests).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test_cli_runner_full_coverage.py`:
- Around line 1521-1528: The test uses a long fixed sleep inside the fake_run
coroutine (function fake_run) when handling the "first task" prompt, which
should be replaced by waiting on an event to avoid timing flakiness: create an
asyncio.Event or Future (e.g., first_task_continue) and in fake_run, replace
await asyncio.sleep(10) with await first_task_continue.wait() (or awaiting the
Future); then signal or cancel that event/future from the test flow when you
want the first task to continue or be cancelled (use the existing
first_task_started Event to know when to trigger it). Update references to
first_task_started and any assertions around render_order/started_prompts so the
test coordinates via the new first_task_continue event instead of a fixed sleep.
- Around line 190-192: The test currently only checks the emitted payload text
but not that the emitter was invoked exactly once; update the test(s) that
inspect message_bus.emit (e.g., the block capturing emitted =
message_bus.emit.call_args[0][0]) to also assert a single emission using
message_bus.emit.assert_called_once() (or assert message_bus.emit.call_count ==
1) before examining emitted.text, and apply the same single-call assertion to
the similar cases at the other locations (the queue and command-completion
lifecycle tests).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4c3081e1-63b5-4fa1-8bb4-d0550322930e
📒 Files selected for processing (10)
code_puppy/agents/event_stream_handler.pycode_puppy/cli_runner.pycode_puppy/plugins/antigravity_oauth/register_callbacks.pycode_puppy/tools/command_runner.pytests/agents/test_event_stream_handler.pytests/plugins/test_antigravity_callbacks_coverage.pytests/plugins/test_antigravity_register_callbacks.pytests/test_cli_runner_coverage.pytests/test_cli_runner_full_coverage.pytests/tools/test_command_runner_full_coverage.py
🚧 Files skipped from review as they are similar to previous changes (2)
- code_puppy/tools/command_runner.py
- code_puppy/plugins/antigravity_oauth/register_callbacks.py
Demo Video
Interactive runtime demo video
Summary
This PR introduces an interactive runtime that lets the user queue prompts and send interjects while their Code Puppy is already running.
The core goal is smooth live interaction across the runtime surfaces that matter in real use:
I tested these surfaces personally across macOS and Windows, and the interactive runtime / state handling feels solid.
The latest updates also address the CodeRabbit correctness comments without broadening the design:
(response, task)completedon real successWhat this introduces
replace_in_filepayloadsTested surfaces
I tested:
Guardrails and regression coverage
The guardrails and regression coverage for this work enforce a few core runtime rules:
AGENT RESPONSEstill renders onceNotes
This is not a broad product redesign. It introduces the interactive runtime and makes the live combinations between prompt, queue, interject, shell, Wiggum, and parallel agents behave correctly.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation